55 research outputs found
WheaCha: A Method for Explaining the Predictions of Models of Code
Attribution methods have emerged as a popular approach to interpreting model
predictions based on the relevance of input features. Although the feature
importance ranking can provide insights of how models arrive at a prediction
from a raw input, they do not give a clear-cut definition of the key features
models use for the prediction. In this paper, we present a new method, called
WheaCha, for explaining the predictions of code models. Although WheaCha
employs the same mechanism of tracing model predictions back to the input
features, it differs from all existing attribution methods in crucial ways.
Specifically, WheaCha divides an input program into "wheat" (i.e., the defining
features that are the reason for which models predict the label that they
predict) and the rest "chaff" for any prediction of a learned code model. We
realize WheaCha in a tool, HuoYan, and use it to explain four prominent code
models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show (1) HuoYan is
efficient - taking on average under twenty seconds to compute the wheat for an
input program in an end-to-end fashion (i.e., including model prediction time);
(2) the wheat that all models use to predict input programs is made of simple
syntactic or even lexical properties (i.e., identifier names); (3) Based on
wheat, we present a novel approach to explaining the predictions of code models
through the lens of training data
Infrared: A Meta Bug Detector
The recent breakthroughs in deep learning methods have sparked a wave of
interest in learning-based bug detectors. Compared to the traditional static
analysis tools, these bug detectors are directly learned from data, thus,
easier to create. On the other hand, they are difficult to train, requiring a
large amount of data which is not readily available. In this paper, we propose
a new approach, called meta bug detection, which offers three crucial
advantages over existing learning-based bug detectors: bug-type generic (i.e.,
capable of catching the types of bugs that are totally unobserved during
training), self-explainable (i.e., capable of explaining its own prediction
without any external interpretability methods) and sample efficient (i.e.,
requiring substantially less training data than standard bug detectors). Our
extensive evaluation shows our meta bug detector (MBD) is effective in catching
a variety of bugs including null pointer dereference, array index out-of-bound,
file handle leak, and even data races in concurrent programs; in the process
MBD also significantly outperforms several noteworthy baselines including
Facebook Infer, a prominent static analysis tool, and FICS, the latest anomaly
detection method
Finding Cross-rule Optimization Bugs in Datalog Engines
Datalog is a popular and widely-used declarative logic programming language.
Datalog engines apply many cross-rule optimizations; bugs in them can cause
incorrect results. To detect such optimization bugs, we propose an automated
testing approach called Incremental Rule Evaluation (IRE), which
synergistically tackles the test oracle and test case generation problem. The
core idea behind the test oracle is to compare the results of an optimized
program and a program without cross-rule optimization; any difference indicates
a bug in the Datalog engine. Our core insight is that, for an optimized,
incrementally-generated Datalog program, we can evaluate all rules individually
by constructing a reference program to disable the optimizations that are
performed among multiple rules. Incrementally generating test cases not only
allows us to apply the test oracle for every new rule generated-we also can
ensure that every newly added rule generates a non-empty result with a given
probability and eschew recomputing already-known facts. We implemented IRE as a
tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely
Souffl\'e, CozoDB, Z, and DDlog, and discovered a total of 30 bugs. Of
these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt
can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the
bugs identified by Deopt, queryFuzz might be unable to detect 5. Our
incremental test case generation approach is efficient; for example, for test
cases containing 60 rules, our incremental approach can produce 1.17
(for DDlog) to 31.02 (for Souffl\'e) as many valid test cases with
non-empty results as the naive random method. We believe that the simplicity
and the generality of the approach will lead to its wide adoption in practice.Comment: The ACM SIGPLAN Conference on Object Oriented Programming, Systems,
Languages, and Applications (2024), Pasadena, California, United State
Automatic Detection, Validation and Repair of Race Conditions in Interrupt-Driven Embedded Software
Interrupt-driven programs are widely deployed in safety-critical embedded
systems to perform hardware and resource dependent data operation tasks. The
frequent use of interrupts in these systems can cause race conditions to occur
due to interactions between application tasks and interrupt handlers (or two
interrupt handlers). Numerous program analysis and testing techniques have been
proposed to detect races in multithreaded programs. Little work, however, has
addressed race condition problems related to hardware interrupts. In this
paper, we present SDRacer, an automated framework that can detect, validate and
repair race conditions in interrupt-driven embedded software. It uses a
combination of static analysis and symbolic execution to generate input data
for exercising the potential races. It then employs virtual platforms to
dynamically validate these races by forcing the interrupts to occur at the
potential racing points. Finally, it provides repair candidates to eliminate
the detected races. We evaluate SDRacer on nine real-world embedded programs
written in C language. The results show that SDRacer can precisely detect and
successfully fix race conditions.Comment: This is a draft version of the published paper. Ke Wang provides
suggestions for improving the paper and README of the GitHub rep
Model-Based Security Testing
Security testing aims at validating software system requirements related to
security properties like confidentiality, integrity, authentication,
authorization, availability, and non-repudiation. Although security testing
techniques are available for many years, there has been little approaches that
allow for specification of test cases at a higher level of abstraction, for
enabling guidance on test identification and specification as well as for
automated test generation.
Model-based security testing (MBST) is a relatively new field and especially
dedicated to the systematic and efficient specification and documentation of
security test objectives, security test cases and test suites, as well as to
their automated or semi-automated generation. In particular, the combination of
security modelling and test generation approaches is still a challenge in
research and of high interest for industrial applications. MBST includes e.g.
security functional testing, model-based fuzzing, risk- and threat-oriented
testing, and the usage of security test patterns. This paper provides a survey
on MBST techniques and the related models as well as samples of new methods and
tools that are under development in the European ITEA2-project DIAMONDS.Comment: In Proceedings MBT 2012, arXiv:1202.582
The Geochemical Features and Genesis of Ferromanganese Deposits from Caiwei Guyot, Northwestern Pacific Ocean
The ferromanganese deposit is a type of marine mineral resource rich in Mn, Fe, Co, Ni, and Cu. Its growth process is generally multi-stage, and the guyot environment and seawater geochemical characteristics have a great impact on the growth process. Here, we use a scanning electron microscope, X-ray diffraction (XRD), inductively coupled plasma optical emission spectrometer (ICP-OES), X-ray fluorescence (XRF), and inductively coupled plasma mass spectrometry (ICP-MS) to test and analyze the texture morphology, microstructure, mineralogical features, geochemical features of ferromanganese crusts deposits at different distribution locations on Caiwei Guyot. The ferromanganese deposits of Caiwei Guyot are ferromanganese nodules on the slope and board ferromanganese crusts on the mountaintop edge, which are both of hydrgenetic origin. Hydrgenetic origin reflects that the metal source is oxic seawater. Global palaeo-ocean events control the geochemistry compositions and growth process of ferromanganese crusts and the nodule. Ferromanganese crusts that formed from the late Cretaceous on the mountaintop edge have a rough surface with black botryoidal shapes, showing an environment with strong hydrodynamic conditions, while the ferromanganese nodule that formed from the Miocene on the slope has an oolitic surface as a result of water depth. What is more, nanoscale or micron-scale diagenesis may occur during the growth process, affecting microstructure, mineralogical and geochemical features
The Geochemical Features and Genesis of Ferromanganese Deposits from Caiwei Guyot, Northwestern Pacific Ocean
The ferromanganese deposit is a type of marine mineral resource rich in Mn, Fe, Co, Ni, and Cu. Its growth process is generally multi-stage, and the guyot environment and seawater geochemical characteristics have a great impact on the growth process. Here, we use a scanning electron microscope, X-ray diffraction (XRD), inductively coupled plasma optical emission spectrometer (ICP-OES), X-ray fluorescence (XRF), and inductively coupled plasma mass spectrometry (ICP-MS) to test and analyze the texture morphology, microstructure, mineralogical features, geochemical features of ferromanganese crusts deposits at different distribution locations on Caiwei Guyot. The ferromanganese deposits of Caiwei Guyot are ferromanganese nodules on the slope and board ferromanganese crusts on the mountaintop edge, which are both of hydrgenetic origin. Hydrgenetic origin reflects that the metal source is oxic seawater. Global palaeo-ocean events control the geochemistry compositions and growth process of ferromanganese crusts and the nodule. Ferromanganese crusts that formed from the late Cretaceous on the mountaintop edge have a rough surface with black botryoidal shapes, showing an environment with strong hydrodynamic conditions, while the ferromanganese nodule that formed from the Miocene on the slope has an oolitic surface as a result of water depth. What is more, nanoscale or micron-scale diagenesis may occur during the growth process, affecting microstructure, mineralogical and geochemical features
- …